A "stereo" document representation for textual information retrieval
نویسندگان
چکیده
Encouraged by a significant improvement over LSI (latent semantic indexing) approach in textual information retrieval of the DLSI (differential latent semantic indexing) approach which technically makes use of two term vectors for each document, we have proposed a concept of stereo, or multiperspective, document representation, which is expected to be effective for most of textual information retrieval approaches based on vector space model. We show that the new representation based on two or more “pictures” of each document taken from different view angles contributes to the enhanced performance of textual document retrieval by enhanced capability of extracting and capturing more individualistic features of the document. A Student t-test on experimental results on the standard Time and ADI corpora proves that the improvements of the retrieval performances of LSI/standard term vector algorithms based on multi-perspective document representation over those based on traditional single document representation are significant. ∗Corresponding Author
منابع مشابه
Document Title Patterns in Information Retrieval
The document titles give an important information about documents. This is why they are frequently used to obtain document keywords. We use them to determine document intentions. To obtain some textual details, we use special information extraction techniques for the construction of extra-topical representations of the documents. This representation reflects a document more completely. A possib...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملEffectiveness of additional representations for the search result presentation on the web
The presentation of search results on the web has been dominated by the textual form of document representation. On the other hand, the document’s visual aspects such as the layout, colour scheme, or presence of images have been studied in a limited context with regard to their effectiveness of search result presentation. This article presents a comparative evaluation of textual and visual form...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 57 شماره
صفحات -
تاریخ انتشار 2006